Tips to prepare code without data at hand

Aleksandra & Clementine

2022-08-23

TIP 1: Create RStudio project:

File -> New Project -> New Directory -> New Project

& a folder structure

Source: Data Carpentry: R for Social Scientists

TIP 2: Don’t use absolute paths

Credit:Allison horst

Choose relative paths!

Credit:Allison horst

TIP 3: Read the data documentation

E.g.: CBS

EULFS

… and make use of it

  • Create a metadata data frame with info about the variables you are going to use
  • It can include fields:
    • Names of variables
    • Variable labels
    • Variable preferred data type (factor/numeric/date/(…))
    • Variable data type in the dataset
    • Coding of missing values

… and make use of it

meta_tibble = tibble::tibble(
  var_name = c('NIGHTWK', 'SATWK', 'SUNWK', 'HWUSUAL'),
  var_label = c('Night work', 'Saturday work', 'Sunday work', 'Hours worked'),
  pref_dtype = c('factor', 'factor', 'factor', 'numeric'),
  dset_dtype = c('numeric', 'numeric', 'numeric', 'numeric'),
  missing = list(c('9',''), c('9',''), c('9',''), c('00','99',''))
  )

print(meta_tibble)
## # A tibble: 4 × 5
##   var_name var_label     pref_dtype dset_dtype missing  
##   <chr>    <chr>         <chr>      <chr>      <list>   
## 1 NIGHTWK  Night work    factor     numeric    <chr [2]>
## 2 SATWK    Saturday work factor     numeric    <chr [2]>
## 3 SUNWK    Sunday work   factor     numeric    <chr [2]>
## 4 HWUSUAL  Hours worked  numeric    numeric    <chr [3]>

… and make use of it

meta_table = data.table::data.table(
  var_name = c('NIGHTWK', 'SATWK', 'SUNWK', 'HWUSUAL'),
  var_label = c('Night work', 'Saturday work', 'Sunday work', 'Hours worked'),
  pref_dtype = c('factor', 'factor', 'factor', 'numeric'),
  dset_dtype = c('numeric', 'numeric', 'numeric', 'numeric'),
  missing = list(c('9',''), c('9',''), c('9',''), c('00','99',''))
  )

print(meta_table)
##    var_name     var_label pref_dtype dset_dtype missing
## 1:  NIGHTWK    Night work     factor    numeric      9,
## 2:    SATWK Saturday work     factor    numeric      9,
## 3:    SUNWK   Sunday work     factor    numeric      9,
## 4:  HWUSUAL  Hours worked    numeric    numeric  00,99,

Read the data and metadata

plot(pressure)

Create functions

to apply on entire dataset or data parts

Make use of lapply/apply

Prepare for data vises

When you get the data …

Example : CBS data

Discussion